Set Up Neon Environment On Android Platform

This post is a guide to set up Neon environment under Android Platform.


Neon Technology

Neon is a kind of hardware based software acceleration technology maintained by ARM.

Today, most of our mobile phone’s CPU support Neon. From hardware’s perspective, neon co-processor has the feature of

  • Wide bit width. D register is 64-bit width and Q register is 128-bit width;
  • Parallel bus operation, ARM claims that Q register can load 128-bit consistent memory simultaneously in parallel;
  • Parallel ALU operation, also, ARM claims that Neon can do 4 integer addition operation simultaneously in parallel.
    If your CPU has a VFP(Vector Floating Point Processor, another co-processor in ARM architecture.), Neon can cooperate with VFP to provide floating point operation support.

Platform

My hardware platform is a mobile phone, CPU is X20 from Mediatek, it has 2-Cortex A72 core and 8-Cortex A53 cores. And it supports Neon technology and VFP.
Software platform is Android Studio 2.1.3, android sdk 24.0.1 and NDK-r12 I suppose. NDK support is important because we are going to use Jave Native Interface (JNI) to do the development, JNI is a interface between Java and C/C++, we can use C/C++ to implement Java method through JNI.

Setting

Here is my Android.mk file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
LOCAL_PATH := $(call my-dir)
include $(CLEAR_VARS)

LOCAL_LDLIBS := -llog

# Enable neon instruction set, very important to Neon development.
LOCAL_ARM_NEON := true

LOCAL_MODULE := addTwoArray
LOCAL_SRC_FILES := addtwoarray.cpp

# Enable OpenMP if you like, doesn't matter here.
LOCAL_CFLAGS += -fopenmp
LOCAL_LDFLAGS += -fopenmp

LOCAL_LDLIBS += -landroid -llog

include $(BUILD_SHARED_LIBRARY)

Code

Java activity:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
package com.yeephycho.twoarraysum;

import android.support.v7.app.AppCompatActivity;
import android.os.Bundle;
import android.util.Log;

public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
Log.d("Neon_Environment_Set_Up", "onCreate: " + Calculator.addTwoArrays(new float[]{1.0f}, new float[]{1.0f}));
}
}

Java Method defination:

1
2
3
4
5
6
7
8
package com.yeephycho.twoarraysum;

public class Calculator {
static {
System.loadLibrary("addTwoArray");
}
public static native float addTwoArrays(float[] a, float[] b);
}

JNI method implementation

Note that I used a few for loops in order to make the time profiling information obvious.

#include "com_yeephycho_twoarraysum_Calculator.h"

#include <stdio.h>
#include <time.h>

#include <stdlib.h>
#include <android/log.h>
#include <arm_neon.h>

#include <omp.h>

#define TAG "Neon_Environment_Set_Up"
#define LOGD(...) __android_log_print(ANDROID_LOG_DEBUG,TAG ,__VA_ARGS__)

extern "C" {
// Time profiling function, count time in milliseconds.
static double currentTimeInMilliseconds() {
    struct timespec res;
    clock_gettime(CLOCK_REALTIME, &res);
    return 1000.0 * res.tv_sec + (double) res.tv_nsec / 1e6;

}
// JNI standard format. Inputs are two input arrays, output is a float number.
JNIEXPORT jfloat JNICALL Java_com_yeephycho_twoarraysum_Calculator_addTwoArrays(JNIEnv *, jclass, jfloatArray, jfloatArray);
JNIEXPORT jfloat JNICALL Java_com_yeephycho_twoarraysum_Calculator_addTwoArrays(JNIEnv *, jclass, jfloatArray a, jfloatArray b){
    jfloat c = 0.01f; // Return value, will be printed through Android studio logcat.

    float array_1[136000]; // Define array and do the initialization.
    float array_2[136000]; // Note that memset is not feasible here because we defined array of floats.

    for(int t = 0; t < 50; t++){
        for(int i = 0; i < 136000; i++){
            array_1[i] = 1.0f;
            array_2[i] = 2.0f;
        }
    }

    unsigned long long time[50]; // An empty array to cache time profiling information.
    unsigned long long start, end;

    for(int t = 0; t < 50; t++){
        start = currentTimeInMilliseconds();
        for(int index = 0; index < 100; index ++){
            for(int i = 0; i < 136000; i++){
                array_1[i] += array_2[i];
            }
        }
        end = currentTimeInMilliseconds();
        time[t] = end - start;
    }
    for(int t = 1; t < 50; t++){
        time[0] += time[t];
    }
    c = array_1[0]; // Return the first float number of array_1.

    LOGD("PURE C TIME COST = %llu", time[0]);
    LOGD("return value = %lf",c);

    // Clean array_1 and array_2.
    for(int t = 0; t < 50; t++){
        for(int i = 0; i < 136000; i++){
            array_1[i] = 1.0f;
            array_2[i] = 2.0f;
        }
    }

    float32x4_t n_array_1; // Define two Neon Q register that cache 4 32-bit float number.
    float32x4_t n_array_2;

    for(int t = 0; t < 50; t++){
        start = currentTimeInMilliseconds();
            for(int index = 0; index < 100; index ++){
                for(int i = 0; i < 136000; i+=4){
                n_array_1 = vld1q_f32(array_1+i); // Load first number.
                n_array_2 = vld1q_f32(array_2+i); // Load second number.
                n_array_1 = vaddq_f32(n_array_1, n_array_2); // Add two numbers together.
                vst1q_f32(array_1+i, n_array_1); // Store the result to array_1.
            }
        }
        end = currentTimeInMilliseconds();
        time[t] = end - start;
    }

    for(int t = 1; t < 50; t++){
        time[0] += time[t];
    }
    c = array_1[0];

    LOGD("NEON INTRINSIC TIME COST = %llu", time[0]);
    LOGD("return value = %lf",c);
    return c;
}

} // End of extern "C", JNI standard format.

Under Android Studio logcat window, search “Neon_Environment_Set_Up”, and on my PC, it shows:


Profiling of Neon under Android of floating point addition operation

If you are interested, try multi-level parallel computing by adding OpenMP. Re-design the program may realize multi-issue and branch prediction feature of your CPU, but it quite depends on your platform, Neon inline assembly may provide even better performance.

Cheers!

License


The content of this blog itself is licensed under the Creative Commons Attribution 4.0 International License.
CC-BY-SA LICENCES

The containing source code (if applicable) and the source code used to format and display that content is licensed under the Apache License 2.0.
Copyright [2016] [yeephycho]
Licensed under the Apache License, Version 2.0 (the “License”);
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Apache License 2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an “AS IS” BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied. See the License for the specific language
governing permissions and limitations under the License.
APACHE LICENCES